11 research outputs found

    Twenty-One at TREC-8: using Language Technology for Information Retrieval

    Get PDF
    This paper describes the official runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR tasks we experimented with automatic sense disambiguation followed by query expansion or translation. We used a combination of thesaurial and corpus information for the disambiguation process. We continued research on CLIR techniques which exploit the target corpus for an implicit disambiguation, by importing the translation probabilities into the probabilistic term-weighting framework. In filtering we extended the use of language models for document ranking with a relevance feedback algorithm for query term reweightin

    Evaluation of a Dutch stemming algorithm

    No full text
    This paper describes the development and evaluation of a suffix stripper for Dutch. We have chosen to modify the stemming algorithm developed by Porter (1980) because it is well known and is frequently used in experimental IR systems. 2 Suffix stripping The core of every suffix stripper is a set of rules which first test whether a word ends with a certain character sequence and subsequently delete this sequence. However, some strippers are a bit more sophisticated than others. Instead of deleting a suffix, they might replace it by another (shorter) suffix or modify the stem itself. Harman (1991) compared three well-known stemming algorithms for English: ffl S--stemmer: a simple stemmer removing the plural
    corecore